Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases
نویسندگان
چکیده
Efficiency is crucial in KDD (Knowledge Discovery in Databases), due to the huge amount of data stored in commercial databases. We argue that high efficiency in KDD can be achieved by combining two approaches, namely mapping KDD functionality onto standard DBMS operations and executing KDD tasks on a parallel SQL server. We propose generic KDD primitives which underly the candidate-rule evaluation procedures of many KDD algorithms, and we evaluate the speed up achieved by a parallel SQL server when executing a decision-tree learner algorithm implemented via these primitives.
منابع مشابه
Knowledge Discovery in Spatial Databases
Both, the number and the size of spatial databases, such as geographic or medical databases, are rapidly growing because of the large amount of data obtained from satellite images, computer tomography or other scientific equipment. Knowledge discovery in databases (KDD) is the process of discovering valid, novel and potentially useful patterns from large databases. Typical tasks for knowledge d...
متن کاملTowards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives
Efficiency and scalability are crucial issues in Knowledge Discovery in Databases (KDD). Our approach to these challenging issues is to devise generic, set-based KDD primitives which are insensitive to the order in which data elements are processed. Such primitives facilitate the exploitation of parallelism. Furthermore, these primitives are generic in that they support a wide selection of rule...
متن کاملEquipAsso: An Algorithm based on New Relational Algebraic Operators for Association Rules Discovery
The task of search for interesting relationships among data has been always an research focus in data mining. The overall performance of mining association rules is determined by the discover the large itemsets, i.e., the sets of itemsets that have their support above a pre-determined minimum support . The algorithms proposed for association rules show different approaches to generate all large...
متن کاملFast Join Execution Using Summary Information in Large Databases
It is well known that a query execution in relational databases is not fast and join execution is generally the most expensive operation in query executions. All three major join methods, namely, nested loops, sorting, and hashing have been perfected to an extent that any further improvement in these methods will enhance the performance of join execution only marginally. Even with the advent of...
متن کاملQuery Languages Supporting Descriptive Rule Mining: A Comparative Study
Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. We provide a comparison between thr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996